Goto

Collaborating Authors

 stable distribution




Analytically deriving Partial Information Decomposition for affine systems of stable and convolution-closed distributions

Neural Information Processing Systems

Bivariate partial information decomposition (PID) has emerged as a promising tool for analyzing interactions in complex systems, particularly in neuroscience. PID achieves this by decomposing the information that two sources (e.g., different


Efficient Ensemble Conditional Independence Test Framework for Causal Discovery

Guan, Zhengkang, Kuang, Kun

arXiv.org Machine Learning

Constraint-based causal discovery relies on numerous conditional independence tests (CITs), but its practical applicability is severely constrained by the prohibitive computational cost, especially as CITs themselves have high time complexity with respect to the sample size. To address this key bottleneck, we introduce the Ensemble Conditional Independence Test (E-CIT), a general and plug-and-play framework. E-CIT operates on an intuitive divide-and-aggregate strategy: it partitions the data into subsets, applies a given base CIT independently to each subset, and aggregates the resulting p-values using a novel method grounded in the properties of stable distributions. This framework reduces the computational complexity of a base CIT to linear in the sample size when the subset size is fixed. Moreover, our tailored p-value combination method offers theoretical consistency guarantees under mild conditions on the subtests. Experimental results demonstrate that E-CIT not only significantly reduces the computational burden of CITs and causal discovery but also achieves competitive performance. Notably, it exhibits an improvement in complex testing scenarios, particularly on real-world datasets.


Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks SUPPLEMENTARY DOCUMENT

Neural Information Processing Systems

In Section S4, the relation between compressibility and the tail index is discussed. Proofs of the main results of the paper are presented in Section S5. Finally, the technical lemmas are proved in Section S6. Here we provide a more detailed explanation for our experimental setting, as well as the results and discussion we omitted from the main paper due to space restrictions. Table 1 includes the number of parameters for each model-dataset combination.


Feature maps for the Laplacian kernel and its generalizations

Ahir, Sudhendu, Pandit, Parthe

arXiv.org Machine Learning

Recent applications of kernel methods in machine learning have seen a renewed interest in the Laplacian kernel, due to its stability to the bandwidth hyperparameter in comparison to the Gaussian kernel, as well as its expressivity being equivalent to that of the neural tangent kernel of deep fully connected networks. However, unlike the Gaussian kernel, the Laplacian kernel is not separable. This poses challenges for techniques to approximate it, especially via the random Fourier features (RFF) methodology and its variants. In this work, we provide random features for the Laplacian kernel and its two generalizations: Mat\'{e}rn kernel and the Exponential power kernel. We provide efficiently implementable schemes to sample weight matrices so that random features approximate these kernels. These weight matrices have a weakly coupled heavy-tailed randomness. Via numerical experiments on real datasets we demonstrate the efficacy of these random feature maps.


A Fast and Simple Algorithm for computing the MLE of Amplitude Density Function Parameters

Teimouri, Mahdi

arXiv.org Machine Learning

Over the last decades, the family of $\alpha$-stale distributions has proven to be useful for modelling in telecommunication systems. Particularly, in the case of radar applications, finding a fast and accurate estimation for the amplitude density function parameters appears to be very important. In this work, the maximum likelihood estimator (MLE) is proposed for parameters of the amplitude distribution. To do this, the amplitude data are \emph{projected} on the horizontal and vertical axes using two simple transformations. It is proved that the \emph{projected} data follow a zero-location symmetric $\alpha$-stale distribution for which the MLE can be computed quite fast. The average of computed MLEs based on two \emph{projections} is considered as estimator for parameters of the amplitude distribution. Performance of the proposed \emph{projection} method is demonstrated through simulation study and analysis of two sets of real radar data.


Infinitely wide limits for deep Stable neural networks: sub-linear, linear and super-linear activation functions

Bordino, Alberto, Favaro, Stefano, Fortini, Sandra

arXiv.org Artificial Intelligence

There is a growing literature on the study of large-width properties of deep Gaussian neural networks (NNs), i.e. deep NNs with Gaussian-distributed parameters or weights, and Gaussian stochastic processes. Motivated by some empirical and theoretical studies showing the potential of replacing Gaussian distributions with Stable distributions, namely distributions with heavy tails, in this paper we investigate large-width properties of deep Stable NNs, i.e. deep NNs with Stable-distributed parameters. For sub-linear activation functions, a recent work has characterized the infinitely wide limit of a suitable rescaled deep Stable NN in terms of a Stable stochastic process, both under the assumption of a ``joint growth" and under the assumption of a ``sequential growth" of the width over the NN's layers. Here, assuming a ``sequential growth" of the width, we extend such a characterization to a general class of activation functions, which includes sub-linear, asymptotically linear and super-linear functions. As a novelty with respect to previous works, our results rely on the use of a generalized central limit theorem for heavy tails distributions, which allows for an interesting unified treatment of infinitely wide limits for deep Stable NNs. Our study shows that the scaling of Stable NNs and the stability of their infinitely wide limits may depend on the choice of the activation function, bringing out a critical difference with respect to the Gaussian setting.


Inference with Multivariate Heavy-Tails in Linear Models

Neural Information Processing Systems

Heavy-tailed distributions naturally occur in many real life problems. Unfortunately, it is typically not possible to compute inference in closed-form in graphical models which involve such heavy tailed distributions. In this work, we propose a novel simple linear graphical model for independent latent random variables, called linear characteristic model (LCM), defined in the characteristic function domain. Using stable distributions, a heavy-tailed family of distributions which is a generalization of Cauchy, L\'evy and Gaussian distributions, we show for the first time, how to compute both exact and approximate inference in such a linear multivariate graphical model. LCMs are not limited to only stable distributions, in fact LCMs are always defined for any random variables (discrete, continuous or a mixture of both).


Deep Stable neural networks: large-width asymptotics and convergence rates

Favaro, Stefano, Fortini, Sandra, Peluchetti, Stefano

arXiv.org Machine Learning

In modern deep learning, there is a recent and growing literature on the interplay between large-width asymptotics for deep Gaussian neural networks (NNs), i.e. deep NNs with Gaussian-distributed weights, and classes of Gaussian stochastic processes (SPs). Such an interplay has proved to be critical in several contexts of practical interest, e.g. Bayesian inference under Gaussian SP priors, kernel regression for infinite-wide deep NNs trained via gradient descent, and information propagation within infinite-wide NNs. Motivated by empirical analysis, showing the potential of replacing Gaussian distributions with Stable distributions for the NN's weights, in this paper we investigate large-width asymptotics for (fully connected) feed-forward deep Stable NNs, i.e. deep NNs with Stable-distributed weights. First, we show that as the width goes to infinity jointly over the NN's layers, a suitable rescaled deep Stable NN converges weakly to a Stable SP whose distribution is characterized recursively through the NN's layers. Because of the non-triangular NN's structure, this is a non-standard asymptotic problem, to which we propose a novel and self-contained inductive approach, which may be of independent interest. Then, we establish sup-norm convergence rates of a deep Stable NN to a Stable SP, quantifying the critical difference between the settings of ``joint growth" and ``sequential growth" of the width over the NN's layers. Our work extends recent results on infinite-wide limits for deep Gaussian NNs to the more general deep Stable NNs, providing the first result on convergence rates for infinite-wide deep NNs.